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I. Tin- Rhal Party in Inti:Rj.«:st 



The real party in interest in this appeal is Google Inc., the assignee of this application. 
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I I Related Appeals and Interferences 

Appellants are not aware of any appeals, judicial proceedings, or interferences that 
will affect directly, will be affected directly by, or will otherwise have a bearing on, the 
decision in this appeal. 
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III. Status of the Claims 

The status of the claims is as follows: 

• Claims canceled: 1—11, 21-36, 41. 

• Claims withdrawn from consideration but not cancelled: None. 

• Claims pending: 12-20, 37-40, 42-58. 

• Claims rejected: 12-20, 37-40, 42-58. 

• Claims appealed: 12-20, 37-40, 42-58. 
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IV. Status of Amendments 

Applicants have proposed an amendment to independent claim 40 to address a 
problem with antecedent basis. As of the filing of this appeal brief the amendment has not 
been entered. All other amendments have been entered. 

A copy of the appealed claims is attached as Section VIII, "Claims Appendix," 
including claim 40 in its original form. 

A marked-up version of claim 40 is attached as Section XI, "Proposed Amendment to 
Claim 40," to identify the proposed amendment to the claim. 
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V. Summary of the Claimed Subject Matter 



This application has six pending independent claims, and each of these independent 
claims incorporates the subject matter described. 

A. The Subject Matter as Claimed in Independent Claim 12 

The claimed subject matter of claim 12 is a method of handling duplicate documents 
in a network crawling system. 1 In the claimed method, when there are duplicates a 
representative document is selected for the set of duplicates, and over time the representative 
may change. 2 

Initially, a set of tables is created that stores information about documents on a 
network. 3 This information includes what documents are duplicates, and the rank (or score) 
of each document. 4 The rank generally indicates the importance or popularity of each 
document. 5 

"Crawling" new documents comprises several operations. 6 Briefly, the method 
entails comparing the new document to an existing set of documents having the same 
content. 7 The new document becomes the representative for that content under certain 
circumstances. 8 If the new document becomes the representative, it is indexed. 9 

The claims and specification describe the crawling operation in more detail. A new 
document is received, and has two properties: its "document identifier" and a "document 



1 Application specification If [0006]. 

2 Application specification Iffl [0006], [0008]. 

3 Application specification Tfjf [0024], [0026], [0051], [0052]; elements 324, 326, 328, 340, 
342, 344 in Fig. 3, elements 340, 342, and 344 in Fig. 4. 

4 Application specification Iff} [0051], [0052]; see, e.g., elements 3410-1 to 3410-4 in Fig. 4. 

5 Application specification [0007], [0044] . 

6 Application specification ^ffl [0006] - [0008]; Figs. 5-10 (providing flowcharts). 

7 Application specification Iffl [0006] - [0008]; elements 1470-20, 1470-30, 1470-50, and 
1470-60 in Fig. 7. 

8 Application specification [0006] - [0008], [0048]; elements 1470-30, 1470-50, 1470-60, 
1470-70 of Fig. 7. The representative is sometimes referred to as the "canonical page." 

9 Application specification [0069], [0070]. See also claim 12 and specification paragraphs 
[0002] and [0023], indicating that indexing makes the page available to a search engine. 
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rank." 10 The document identifier identifies the content of the document. 11 For example, the 
specification teaches using a 64 bit "content fingerprint." 12 The document rank is a numeric 
ranking of the document compared to other documents on the network. 13 

Another operation is to identify other known documents that share the same content 
as the new document. 14 This data is read from the tables. 15 This set of documents has a 
representative document, which the claims refer to as the "original representative 
document." 16 

Once the new document and the set of documents are identified, the method calls for 
determining a representative of the enlarged set. 17 The enlarged set comprises the set of 
documents identified earlier, together with the new document. 18 The representative of the 
enlarged set may be different from the original representative document because the 
representative of the enlarged set may be the new document. 19 The specification provides 
examples of determining the representative based on the rankings of the documents. 20 One 
method also applies a hysteresis test so that a change in the representative occurs only when 
the new document is sufficiently better than the original representative document. 21 

The information in the tables is updated to include the new document, and potentially 
changes to other documents. 22 For example, if the document that was the original 
representative is no longer the representative, the table is updated to show that fact. 23 



10 Application specification Ifff [0047], [0062]; element 1450-2 in Fig. 5 and element 1470-10 
in Fig. 7. 

11 Application specification lffl [0007], [0047]. 

12 Application specification ^ [0007], [0047]. 

13 Application specification ^ [0007], [0044]. 

14 Application specification H [0007], [0067], [0068]; element 1470-20 in Fig. 7 

15 Application specification Tflj [0007], [0051] (data stored in a content fingerprint table 
(CFT)), [0066] - [0068]; element 340 in Fig. 4. 

16 Application specification Iffl [0067], [0068]; elements 3410-3 and 3420-2 in Fig. 4. See 
also claim 12. 

17 Application specification Ifll [0006], [0008]; elements 1470-30, 1470-60, 1470-65, 1470- 
70, and 1470-80 in Fig. 7. 

18 Application specification Iffl [0006], [0008]. 

19 Application specification Iflj [0006], [0008]; element 1470-70 in Fig. 7. 

20 Application specification [0008], [0067]. 

21 Application specification 1fl| [0068], [0069]; element 1470-60 in Fig. 7. 

22 Application specification lfl| [0069] - [0074]; elements 1470-60, 1470-65, and 1470-70 in 
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In addition, if the new document becomes the representative document, it is 
indexed. 24 Indexing is the operation that makes the document available to a search engine. 2 ' 

The crawling operation described above is repeated for multiple documents, and in 
some cases the new document becomes the representative document. 26 When the new 
document becomes the representative document, the representative document has changed 
because the original representative document was from a set of documents that did not 
include the new document. 27 

B. The Subject Matter as Claimed in Independent Claim 18 

The claimed subject matter of claim 18 is similar to the method of claim 12. The 
subject matter of claim 18 is a method of handling duplicate documents in a network 
crawling system. 28 In the claimed method, when there are duplicates a representative 
document is selected for the set of duplicates, and over time the representative may change/ 

Initially, a set of N+l tables is created that stores information about documents on a 
network, where N is an integer greater than one. 30 The N+l tables comprise N tables, each 
generated during a respective phase of a set of N crawling phases, and a current table 
generated during a current one of the N crawling phases, wherein an oldest one of the N 
tables was generated during a previous instance of the current crawling phase. 31 The 
information in the tables includes what documents are duplicates, and the rank (or score) of 
each document. 32 The rank generally indicates the importance or popularity of each 
document. 33 



Fig. 7, Fig. 9 (flowchart showing the table update operation). 

23 Application specification If [0070] (unmarking entries as necessary to show that they are no 
longer the canonical pages). 

24 Application specification [0069], [0070]. See also claim 12. 

25 Application specification ^ [0002], [0023]. 

26 See claim 12, last paragraph. 

27 Application specification If [0070]; element 1470-70 in Fig. 7. 

28 Application specification If [0006]. 

29 Application specification [0006], [0008]. 

30 Application specification Tfll [0024], [0026], [0051], [0052], [0078]; elements 324, 326, 
328, 340, 342, 344 in Fig. 3, elements 340, 342, and 344 in Fig. 4. 

31 Application specification 1fl| [0078] 

32 Application specification [0051], [0052]; see, e.g., elements 3410-1 to 3410-4 in Fig. 4. 

33 Application specification ^ [0007], [0044]. 



l-PA/3695090 



ATTORNEY DOCKET NO.: 060963-0005-US 
Application No.: 10/614,111 
Page 10 



"Crawling" new documents comprises several operations. Briefly, the method 
entails comparing the new document to an existing set of documents having the same 
content. 35 The new document becomes the representative for that content under certain 
circumstances. 36 If the new document becomes the representative, it is indexed. 37 

The claims and specification describe the crawling operation in more detail. A new 
document is received, and has two properties: its "document identifier" and a "document 
rank." 38 The document identifier identifies the content of the document. 39 For example, the 
specification teaches using a 64 bit "content fingerprint." 40 The document rank is a numeric 
ranking of the document compared to other documents on the network. 41 

Another operation is to identify other known documents that share the same content 
as the new document. 42 This data is read from the N+ 1 tables. 43 This set of documents has a 
representative document, which the claims refer to as the "original representative 
document." 44 



Once the new document and the set of documents are identified, the method calls for 
determining a representative of the enlarged set. 45 The enlarged set comprises the set of 
documents identified earlier, together with the new document. 46 The representative of the 



Application specification Tflf [0006] - [0008]; Figs. 5-10 (providing flowcharts). 

35 Application specification Ifil [0006] - [0008]; elements 1470-20, 1470-30, 1470-50, and 
1470-60 in Fig. 7. 

36 Application specification ^ [0006] - [0008], [0048]; elements 1470-30, 1470-50, 1470-60, 
1470-70 of Fig. 7. The representative is sometimes referred to as the "canonical page." 

37 Application specification Tffl [0069], [0070]. See also claim 12 and specification 
paragraphs [0002] and [0023], indicating that indexing makes the page available to a search 
engine. 

38 Application specification Ifll [0047], [0062]; element 1450-2 in Fig. 5 and element 1470-10 
in Fig. 7. 

39 Application specification [0007], [0047]. 

40 Application specification ^ [0007], [0047]. 

41 Application specification jfl [0007], [0044]. 

42 Application specification Tf [0007], [0067], [0068]; element 1470-20 in Fig. 7 

43 Application specification Iflf [0007], [0051] (data stored in a content fingerprint table 
(CFT)), [0066] - [0068], [0079]; element 340 in Fig. 4. 

44 Application specification Iffl [0067], [0068]; elements 3410-3 and 3420-2 in Fig. 4. See 
also claim 12. 

45 Application specification Ifll [0006], [0008]; elements 1470-30, 1470-60, 1470-65, 1470- 
70, and 1470-80 in Fig. 7. 

46 Application specification Iffl [0006], [0008]. 



l-PA/3695090 



ATTORNEY DOCKET NO.: 060963-0005-US 
Application No.: 10/614,111 
Page 1 1 



enlarged set may be different from the original representative document because the 
representative of the enlarged set may be the new document. 47 The specification provides 
examples of determining the representative based on the rankings of the documents. 48 One 
method also applies a hysteresis test so that a change in the representative occurs only when 
the new document is sufficiently better than the original representative document. 49 

The information in the tables is updated to include the new document, and potentially 
changes to other documents. 50 For example, if the document that was the original 
representative is no longer the representative, the table is updated to show that fact. 51 

In addition, if the new document becomes the representative document, it is 
indexed. 52 Indexing is the operation that makes the document available to a search engine. 53 

The crawling operation described above is repeated for multiple documents, and in 
some cases the new document becomes the representative document. 54 When the new 
document becomes the representative document, the representative document has changed 
because the original representative document was from a set of documents that did not 
include the new document. 55 

Upon completion of a current crawling phase, the oldest one of the N tables is 
retired. 56 

C. The Subject Matter as Claimed in Independent Claim 37 

The claimed subject matter of claim 37 is a system that handles duplicate documents 
in a network crawling system. 57 In the claimed system, when there are duplicates a 



47 Application specification Ifll [0006], [0008]; element 1470-70 in Fig. 7. 

48 Application specification Iffl [0008], [0067]. 

49 Application specification Tfll [0068], [0069]; element 1470-60 in Fig. 7. 

50 Application specification Iflf [0069] - [0074]; elements 1470-60, 1470-65, and 1470-70 in 
Fig. 7, Fig. 9 (flowchart showing the table update operation). 

51 Application specification If [0070] (unmarking entries as necessary to show that they are no 
longer the canonical pages). 

52 Application specification Tflj [0069], [0070]. See also claim 12. 

53 Application specification [0002], [0023]. 

54 See claim 12, last paragraph. 

55 Application specification «[j [0070]; element 1470-70 in Fig. 7. 

56 Application specification If [0078]. 

57 Application specification lf[| [0006], [0011], [0050], [0051]; element 200 in Fig. 2, 
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representative document is selected for the set of duplicates, and over time the representative 
may change. 58 

The system includes one or more central processing units (CPU's) and a network 
interface. 59 The CPU's execute the instructions to operate the system. 60 

Initially, a set of N+l tables is created that stores information about documents on a 
network, where N is an integer greater than one. 61 The N+l tables comprise N tables, each 
generated during a respective phase of a set of N crawling phases, and a current table 
generated during a current one of the N crawling phases, wherein an oldest one of the N 
tables was generated during a previous instance of the current crawling phase. 62 The 
information in the tables includes what documents are duplicates, and the rank (or score) of 
each document. 63 The rank generally indicates the importance or popularity of each 
document. 64 

"Crawling" new documents comprises several operations. 65 Briefly, the system 
compares the new document to an existing set of documents having the same content. 66 The 
new document becomes the representative for that content under certain circumstances. 67 If 
the new document becomes the representative, it is indexed. 68 

The claims and specification describe the crawling operation in more detail. The 
system receives a new document, and the new document has two properties: its "document 



elements 300, 322 in Fig. 3. 

58 Application specification lfl| [0006], [0008]. 

59 Application specification til [0051], [0058]; elements 302, 310 in Fig. 3. 

60 Application specification Iflj [0051], [0058]; elements 302, 322 in Fig. 3. 

61 Application specification Iffl [0024], [0026], [0051], [0052], [0078]; elements 324, 326, 
328, 340, 342, 344 in Fig. 3, elements 340, 342, and 344 in Fig. 4. 

62 Application specification lfl| [0078] 

63 Application specification ^ [005 1], [0052]; see, e.g., elements 3410-1 to 3410-4 in Fig. 4. 

64 Application specification jft [0007], [0044]. 

65 Application specification 1ffl [0006] - [0008]; Figs. 5-10 (providing flowcharts). 

66 Application specification Ifll [0006] - [0008]; elements 1470-20, 1470-30, 1470-50, and 
1470-60 in Fig. 7. 

67 Application specification lffl [0006] - [0008], [0048]; elements 1470-30, 1470-50, 1470-60, 
1470-70 of Fig. 7. The representative is sometimes referred to as the "canonical page." 

68 Application specification [0069], [0070]. See also claim 12 and specification 
paragraphs [0002] and [0023], indicating that indexing makes the page available to a search 
engine. 



l-PA/3695090 



ATTORNEY DOCKET NO.: 060963-0005-US 
Application No.: 10/614,111 
Page 13 

identifier" and a "document rank." 69 The document identifier identifies the content of the 
document. 70 For example, the specification teaches using a 64 bit "content fingerprint." 71 
The document rank is a numeric ranking of the document compared to other documents on 
the network. 72 

The system identifies other known documents that share the same content as the new 
document. 73 This data is read from the N+ 1 tables. 74 This set of documents has a 
representative document, which the claims refer to as the "original representative 
document." 75 

Once the new document and the set of documents are identified, the system 
determines a representative of the enlarged set. 76 The enlarged set comprises the set of 
documents identified earlier, together with the new document. 77 The representative of the 
enlarged set may be different from the original representative document because the 
representative of the enlarged set may be the new document. 78 The specification provides 
examples of determining the representative based on the rankings of the documents. 79 The 
system may also apply a hysteresis test so that a change in the representative occurs only 
when the new document is sufficiently better than the original representative document. 80 



69 Application specification lfl| [0047], [0062]; element 1450-2 in Fig. 5 and element 1470-10 
in Fig. 7. 

70 Application specification Iffl [0007], [0047]. 

71 Application specification^ [0007], [0047]. 

72 Application specification Iffl [0007], [0044]. 

73 Application specification If [0007], [0067], [0068]; element 1470-20 in Fig. 7 

74 Application specification [0007], [0051] (data stored in a content fingerprint table 
(CFT)), [0066] - [0068], [0079]; element 340 in Fig. 4. 

75 Application specification Iffi [0067], [0068]; elements 3410-3 and 3420-2 in Fig. 4. See 
also claim 12. 

76 Application specification lfl| [0006], [0008]; elements 1470-30, 1470-60, 1470-65, 1470- 
70, and 1470-80 in Fig. 7. 

77 Application specification [0006], [0008]. 

78 Application specification Tfjf [0006], [0008]; element 1470-70 in Fig. 7. 

79 Application specification [0008], [0067]. 

80 Application specification til [0068], [0069]; element 1470-60 in Fig. 7. 
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The system updates information in the N+l tables to include the new document, and 
potentially changes to other documents. 81 For example, if the document that was the original 
representative is no longer the representative, the table is updated to show that fact. 82 

In addition, if the new document becomes the representative document, it is 
indexed. 83 Indexing is the operation that makes the document available to a search engine. 84 

The system repeats the crawling operations described above for multiple documents, 
and in some cases the new document becomes the representative document. 85 When the new 
document becomes the representative document, the representative document has changed 
because the original representative document was from a set of documents that did not 
include the new document. 86 

Upon completion of a current crawling phase, the oldest one of the N tables is 
retired. 87 

D. The Subject Matter as Claimed in Independent Claim 40 

The claimed subject matter of claim 40 is a computer program product that handles 
duplicate documents in a network crawling system. 88 In the claimed computer program 
product, when there are duplicates a representative document is selected for the set of 
duplicates, and over time the representative may change. 89 

Initially, a set of data structures is created that stores information about documents on 
a network. 90 This information includes what documents are duplicates, and the rank (or 



81 Application specification Iflj [0069] - [0074]; elements 1470-60, 1470-65, and 1470-70 in 
Fig. 7, Fig. 9 (flowchart showing the table update operation). 

82 Application specification If [0070] (unmarking entries as necessary to show that they are no 
longer the canonical pages). 

83 Application specification Iflf [0069], [0070]. See also claim 12. 

84 Application specification [0002], [0023]. 

85 See claim 12, last paragraph. 

86 Application specification If [0070]; element 1470-70 in Fig. 7. 

87 Application specification If [0078]. 

88 Application specification lffl [0006], [0050], [0051]; elements 300, 322 in Fig. 3. 

89 Application specification Iffl [0006], [0008]. 

90 Application specification Iffl [0024], [0026], [0051], [0052]; elements 112-1 - 112-Q, 104, 
and 106 in Fig. 1, elements 324, 326, 328, 340, 342, 344 in Fig. 3, elements 340, 342, and 
344 in Fig. 4. 
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score) of each document. The rank generally indicates the importance or popularity of each 
document. 92 

"Crawling" a requesting document comprises several operations. 93 Briefly, the 
computer program product compares the requesting document to an existing set of documents 
having the same content. 94 The requesting document becomes the representative for that 
content under certain circumstances. 95 If the requesting document becomes the 
representative, it is indexed. 96 

The claims and specification describe the crawling operation in more detail. The 
computer program product receives a requesting document, and the requesting document has 
two properties: its "document identifier" and a "document rank." 97 The document identifier 
identifies the content of the document. 98 For example, the specification teaches using a 64 bit 
"content fingerprint." 99 The document rank is a numeric ranking of the document compared 
to other documents on the network. 100 

The computer program product identifies other known documents that share the same 
content as the requesting document. 101 This data is read from the data structures. 102 This set 
of documents has a representative document, which the claims refer to as the "original 
representative document." 103 



91 Application specification 1fl| [0051], [0052]; see, e.g., elements 3410-1 to 3410-4 in Fig. 4. 

92 Application specification [0007], [0044]. 

93 Application specification lfl| [0006] - [0008]; Figs. 5-10 (providing flowcharts). 

94 Application specification lfl| [0006] - [0008]; elements 1470-20, 1470-30, 1470-50, and 
1470-60 in Fig. 7. 

95 Application specification Iflj [0006] - [0008], [0048]; elements 1470-30, 1470-50, 1470-60, 
1470-70 of Fig. 7. The representative is sometimes referred to as the "canonical page." 

96 Application specification Iffl [0069], [0070]. See also claim 12 and specification 
paragraphs [0002] and [0023], indicating that indexing makes the page available to a search 
engine. 

97 Application specification Iff} [0047], [0062]; element 1450-2 in Fig. 5 and element 1470-10 
in Fig. 7. 

98 Application specification Iffl [0007], [0047]. 

99 Application specification Iffl [0007], [0047]. 

100 Application specification Tfil [0007], [0044]. 

101 Application specification If [0007], [0067], [0068]; element 1470-20 in Fig. 7 

102 Application specification Iflf [0007], [0051] (data stored in a content fingerprint table 
(CFT)), [0066] - [0068]; element 340 in Fig. 4. 

103 Application specification lfl| [0067], [0068]; elements 3410-3 and 3420-2 in Fig. 4. See 
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Once the requesting document and the set of documents are identified, the computer 
program product determines a representative of the enlarged set. 104 The enlarged set 
comprises the set of documents identified earlier, together with the requesting document. 105 
The representative of the enlarged set may be different from the original representative 
document because the representative of the enlarged set may be the requesting document. 106 
The specification provides examples of determining the representative based on the rankings 
of the documents. 107 The computer program product may also apply a hysteresis test so that a 
change in the representative occurs only when the new document is sufficiently better than 
the original representative document. 108 

The computer program product updates information in the data structures to include 
the requesting document, and potentially changes to other documents. 109 For example, if the 
document that was the original representative is no longer the representative, the data 
structures are updated to show that fact. 110 

In addition, if the requesting document becomes the representative document, it is 
indexed. 111 Indexing is the operation that makes the document available to a search 
engine. 112 

The computer program product repeats the crawling operations described above for 
multiple documents, and in some cases the requesting document becomes the representative 
document. 113 When the requesting document becomes the representative document, the 



also claim 12. 

104 Application specification Iffl [0006], [0008]; elements 1470-30, 1470-60, 1470-65, 1470- 
70, and 1470-80 in Fig. 7. 

105 Application specification lfl| [0006], [0008]. 

106 Application specification Ifil [0006], [0008]; element 1470-70 in Fig. 7. 

107 Application specification Ifll [0008], [0067]. 

108 Application specification Ifll [0068], [0069]; element 1470-60 in Fig. 7. 

109 Application specification Ifil [0069] - [0074]; elements 1470-60, 1470-65, and 1470-70 in 
Fig. 7, Fig. 9 (flowchart showing the table update operation). 



no 



Application specification If [0070] (unmarking entries as necessary to show that they are 



no longer the canonical pages). 

111 Application specification lfll [0069], [0070]. See also claim 12. 

112 Application specification Ifll [0002], [0023]. 

113 See claim 12, last paragraph. 
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representative document has changed because the original representative document was from 
a set of documents that did not include the requesting document. 114 

E. The Subject Matter as Claimed in Independent Claim 50 

The claimed subject matter of claim 50 is a computer program product that handles 
duplicate documents in a network crawling system. 115 In the claimed computer program 
product, when there are duplicates a representative document is selected for the set of 
duplicates, and over time the representative may change 



116 



Initially, a set of tables is created that stores information about documents on a 
network. 117 This information includes what documents are duplicates, and the rank (or score) 
of each document. 118 The rank generally indicates the importance or popularity of each 
document. 119 

"Crawling" new documents comprises several operations. 120 Briefly, the computer 
program product compares the new document to an existing set of documents having the 
same content. 121 The new document becomes the representative for that content under certain 
circumstances. 122 If the new document becomes the representative, it is indexed. 123 

The claims and specification describe the crawling operation in more detail. The 
computer program product receives a new document, and the new document has two 
properties: its "document identifier" and a "document rank." 124 The document identifier 



114 Application specification If [0070]; element 1470-70 in Fig. 7. 

115 Application specification lfl| [0006], [0050], [0051]; elements 300, 322 in Fig. 3. 

116 Application specification Ifil [0006], [0008]. 

117 Application specification Iflj [0024], [0026], [0051], [0052]; elements 324, 326, 328, 340, 
342, 344 in Fig. 3, elements 340, 342, and 344 in Fig. 4. 

118 Application specification [0051], [0052]; see, e.g., elements 3410-1 to 3410-4 in Fig. 4. 

119 Application specification Ifll [0007], [0044]. 

120 Application specification Iflj [0006] - [0008]; Figs. 5-10 (providing flowcharts). 

121 Application specification Iffl [0006] - [0008]; elements 1470-20, 1470-30, 1470-50, and 
1470-60 in Fig. 7. 

122 Application specification ^ [0006] - [0008], [0048]; elements 1470-30, 1470-50, 1470- 
60, 1470-70 of Fig. 7. The representative is sometimes referred to as the "canonical page." 

123 Application specification Iflj [0069], [0070]. See also claim 12 and specification 
paragraphs [0002] and [0023], indicating that indexing makes the page available to a search 
engine. 

124 Application specification Ifll [0047], [0062]; element 1450-2 in Fig. 5 and element 1470- 
10 in Fig. 7. 
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identifies the content of the document. 125 For example, the specification teaches using a 64 
bit "content fingerprint." 126 The document rank is a numeric ranking of the document 
compared to other documents on the network. 127 

The computer program product identifies other known documents that share the same 
content as the new document. This data is read from the tables. This set of documents 
has a representative document, which the claims refer to as the "original representative 
document." 130 

Once the new document and the set of documents are identified, the computer 
program product determines a representative of the enlarged set. 131 The enlarged set 
comprises the set of documents identified earlier, together with the new document. 132 The 
representative of the enlarged set may be different from the original representative document 
because the representative of the enlarged set may be the new document. 133 The specification 
provides examples of determining the representative based on the rankings of the 
documents. 134 The computer program product may also apply a hysteresis test so that a 
change in the representative occurs only when the new document is sufficiently better than 
the original representative document. 135 

The computer program product updates information in the tables to include the new 
document, and potentially changes to other documents. 136 For example, if the document that 



125 Application specification lfl| [0007], [0047]. 

126 Application specification Ifil [0007], [0047]. 

127 Application specification Ifil [0007], [0044]. 

128 Application specification % [0007], [0067], [0068]; element 1470-20 in Fig. 7 

129 Application specification Iflf [0007], [0051] (data stored in a content fingerprint table 
(CFT)), [0066] - [0068]; element 340 in Fig. 4. 

130 Application specification lffl [0067], [0068]; elements 3410-3 and 3420-2 in Fig. 4. See 
also claim 12. 

131 Application specification Ifll [0006], [0008]; elements 1470-30, 1470-60, 1470-65, 1470- 
70, and 1470-80 in Fig. 7. 

132 Application specification lfl| [0006], [0008]. 

133 Application specification Ifil [0006], [0008]; element 1470-70 in Fig. 7. 

134 Application specification Ifll [0008], [0067]. 

135 Application specification Ifll [0068], [0069]; element 1470-60 in Fig. 7. 

136 Application specification Ifll [0069] - [0074]; elements 1470-60, 1470-65, and 1470-70 in 
Fig. 7, Fig. 9 (flowchart showing the table update operation). 
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was the original representative is no longer the representative, the table is updated to show 
that fact. 137 

In addition, if the new document becomes the representative document, it is 
indexed. 138 Indexing is the operation that makes the document available to a search 
engine. 139 

The computer program product repeats the crawling operations described above for 
multiple documents, and in some cases the new document becomes the representative 
document. 140 When the new document becomes the representative document, the 
representative document has changed because the original representative document was from 
a set of documents that did not include the new document. 141 

F. The Subject Matter as Claimed in Independent Claim 56 

The claimed subject matter of claim 18 is a computer program product that handles 
duplicate documents in a network crawling system. 142 In the claimed computer program 
product, when there are duplicates a representative document is selected for the set of 
duplicates, and over time the representative may change. 143 

Initially, a set of N+l tables is created that stores information about documents on a 
network, where N is an integer greater than one. 144 The N+l tables comprise N tables, each 
generated during a respective phase of a set of N crawling phases, and a current table 
generated during a current one of the N crawling phases, wherein an oldest one of the N 
tables was generated during a previous instance of the current crawling phase. 145 The 
information in the tables includes what documents are duplicates, and the rank (or score) of 



Application specification [0070] (unmarking entries as necessary to show that they are 
no longer the canonical pages). 

138 Application specification Ifij [0069], [0070]. See also claim 12. 

139 Application specification Ifll [0002], [0023]. 

140 See claim 12, last paragraph. 

141 Application specification \ [0070]; element 1470-70 in Fig. 7. 

142 Application specification Tfil [0006], [0050], [0051]; elements 300, 322 in Fig. 3. 

143 Application specification Ifll [0006], [0008]. 

144 Application specification Iffl [0024], [0026], [0051], [0052], [0078]; elements 324, 326, 
328, 340, 342, 344 in Fig. 3, elements 340, 342, and 344 in Fig. 4. 

145 Application specification Tflj [0078] 
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each document. 146 The rank generally indicates the importance or popularity of each 
document. 147 

"Crawling" new documents comprises several operations. 148 Briefly, the computer 
program product compares the new document to an existing set of documents having the 
same content. 149 The new document becomes the representative for that content under certain 
circumstances. 150 If the new document becomes the representative, it is indexed. 151 

The claims and specification describe the crawling operation in more detail. The 
computer program product receives a new document, and the new document has two 
properties: its "document identifier" and a "document rank." 152 The document identifier 
identifies the content of the document. 153 For example, the specification teaches using a 64 
bit "content fingerprint." 154 The document rank is a numeric ranking of the document 
compared to other documents on the network. 155 

The computer program product identifies other known documents that share the same 
content as the new document. 156 This data is read from the N+ 1 tables. 157 This set of 
documents has a representative document, which the claims refer to as the "original 
representative document." 158 



146 Application specification Tfll [0051], [0052]; see, e.g., elements 3410-1 to 3410-4 in Fig. 4. 

147 Application specification Iflj [0007], [0044]. 

148 Application specification Iflf [0006] - [0008]; Figs. 5-10 (providing flowcharts). 

149 Application specification Ifll [0006] - [0008]; elements 1470-20, 1470-30, 1470-50, and 
1470-60 in Fig. 7. 

150 Application specification jffl [0006] - [0008], [0048]; elements 1470-30, 1470-50, 1470- 
60, 1470-70 of Fig. 7. The representative is sometimes referred to as the "canonical page." 

151 Application specification Ifff [0069], [0070]. See also claim 12 and specification 
paragraphs [0002] and [0023], indicating that indexing makes the page available to a search 
engine. 

152 Application specification lffl [0047], [0062]; element 1450-2 in Fig. 5 and element 1470- 
10 in Fig. 7. 

153 Application specification Ifll [0007], [0047]. 

154 Application specification Iff! [0007], [0047]. 

155 Application specification Ifll [0007], [0044]. 

156 Application specification % [0007], [0067], [0068]; element 1470-20 in Fig. 7 

157 Application specification [0007], [0051] (data stored in a content fingerprint table 
(CFT)), [0066] - [0068], [0079]; element 340 in Fig. 4. 

158 Application specification Ifll [0067], [0068]; elements 3410-3 and 3420-2 in Fig. 4. See 
also claim 12. 
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Once the new document and the set of documents are identified, the computer 
program product determines a representative of the enlarged set. 159 The enlarged set 
comprises the set of documents identified earlier, together with the new document. 160 The 
representative of the enlarged set may be different from the original representative document 
because the representative of the enlarged set may be the new document. 161 The specification 
provides examples of determining the representative based on the rankings of the 
documents. 162 The computer program product may also apply a hysteresis test so that a 
change in the representative occurs only when the new document is sufficiently better than 
the original representative document. 163 

The computer program product updates information in the N+l tables to include the 
new document, and potentially changes to other documents. 164 For example, if the document 
that was the original representative is no longer the representative, the table is updated to 
show that fact. 165 

In addition, if the new document becomes the representative document, it is 
indexed. 166 Indexing is the operation that makes the document available to a search 
engine. 167 

The computer program product repeats the crawling operations described above for 
multiple documents, and in some cases the new document becomes the representative 
document. 168 When the new document becomes the representative document, the 



^Application specification 1fl| [0006], [0008]; elements 1470-30, 1470-60, 1470-65, 1470- 
70, and 1470-80 in Fig. 7. 

160 Application specification lfl| [0006], [0008]. 

161 Application specification Ifil [0006], [0008]; element 1470-70 in Fig. 7. 

162 Application specification Ifll [0008], [0067]. 

163 Application specification Ifll [0068], [0069]; element 1470-60 in Fig. 7. 

164 Application specification Ifil [0069] - [0074]; elements 1470-60, 1470-65, and 1470-70 in 
Fig. 7, Fig. 9 (flowchart showing the table update operation). 

165 Application specification If [0070] (unmarking entries as necessary to show that they are 
no longer the canonical pages). 

166 Application specification 1fl| [0069], [0070]. See also claim 12. 

167 Application specification Ifll [0002], [0023]. 

168 See claim 12, last paragraph. 
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representative document has changed because the original representative document was from 
a set of documents that did not include the new document. 169 

Upon completion of a current crawling phase, the oldest one of the N tables is 
retired. 170 



Application specification If [0070]; element 1470-70 in Fig. 7. 
Application specification Tf [0078]. 
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VI. Grounds of Rejection Presented for Review 



Examiner Morrison has rejected all of the pending claims under 35 U.S.C. § 103. 
A. The § 103 Rejection of Claims 12-17, 40, 42-48, and 50-55. 

The Examiner stated in the Office Action mailed 08/20/2007 on page 2 that: 

Claims 12-17, 40, 42-48 and 50-55 are rejected under 35 U.S.C. 103(a) 
as being unpatentable over Meverzon et al. (' Meverzon ' hereinafter) (Patent 
Number 6,547,829) in view of Cho et al. ('Cho' hereinafter) ("Finding 
replicated web collections," by Cho et al., Proceedings of the ACM SIGMOD 
International Conference on Management of Data, pages 355-366, 2000). 

As per claim 12, Meverzon teaches 

A method of detecting duplicate documents in a network crawling 
system, comprising: (see abstract and background) 

constructing a plurality of tables, each table corresponding to a portion 
of a document address space (builds new index based on documents, column 
4, lines 43-60), storing information identifying documents having a same 
document identifier and each identified document having an associated 
document rank; (column 2, lines 3-16) 

receiving a newly crawled document, such document characterized by 
a document identifier and a document rank; (column 2, lines 3-16) 

reading information stored in the plurality of tables to identify a set of 
documents, sharing the document identifier of the newly crawled document, 
and ascertaining as original representative document for the identified set of 
documents; (column 9, lines 18-29) 

updating the information stored in at least one of the tables in 
accordance with the document ranks of the identified set of documents and the 
newly crawled document; (column 2, lines 3-16) 

determining a representative document for the newly crawled 
documents and the identified set of documents (column 9, lines 32-40) 

Meverzon does not explicitly indicate "indexing the representative 
document when the representative document is the newly crawled document; 
and repeating the receiving, reading, updating, determining and indexing 
operations with respect to a plurality of newly crawled documents, each of 
which shares a respective document identifier with a respective set of 
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documents, such that at least some of the newly crawled documents are 
determined to be representative documents and are indexed". 

However, Cho discloses "indexing the representative document when 
the representative document is the newly crawled document; and repeating the 
receiving, reading, updating, determining and indexing operations with respect 
to a plurality of newly crawled documents, each of which shares a respective 
document identifier with a respective set of documents, such that at least some 
of the newly crawled documents are determined to be representative 
documents and are indexed" (newly replicated collection, page 365, first 
column, second paragraph; one page displayed or represents collection of 
duplicate document, page 365, second column, first paragraph). 

It would have been obvious to one of skill in the art at the time the 
invention was made to combine Meyerzon and Cho because using the steps of 
"indexing the representative document when the representative document is 
the newly crawled document; and repeating the receiving, reading, updating, 
determining and indexing operations with respect to a plurality of newly 
crawled documents, each of which shares a respective document identifier 
with a respective set of documents, such that at least some of the newly 
crawled documents are determined to be representative documents and are 
indexed" would have given those skilled in the art the tools to improve the 
invention by allowing duplicate documents to be identified and represented. 
This gives the use the advantage of not having multiple copies of the same 
document to choose from . 

For this appeal, Appellants focus on the limitation "repeating the receiving, reading, 
updating, determining and indexing operations with respect to a plurality of newly crawled 
documents, each of which shares a respective document identifier with a respective set of 
documents, such that at least some of the newly crawled documents are determined to be 
representative documents and are indexed." This limitation is a portion of what the Examiner 
indicated was not taught by Meyerzon. Appellants will demonstrate that this limitation is not 
taught by Meyerzon or Cho. 

Appellants seek to streamline the appeal process by focusing on this limitation, but do 
not thereby admit to the correctness or appropriateness of any other statement or issue raised 
by Examiner Morrison. 

B. The § 103 Rejection of Claims 18-20, 37-39, and 56-58. 

The Examiner stated in the Office Action mailed 08/20/2007 on page 1 1 that: 
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Claims 18-20, 37-39 and 56-58 are rejected under 35 U.S.C. 103(a) as 
being unpatentable over Meverzon et al. (' Meverzon ' hereinafter) (Patent 
Number 6,547,829) in view of Cho et al. ('Cho' hereinafter) ("Finding 
replicated web collections," by Cho et al., Proceedings of the ACM SIGMOD 
International Conference on Management of Data, pages 355-366, 2000) and 
further in view of Ruian et al. (' Ruian ' hereinafter) (Patent Number 6,976,207) 

The Examiner's analysis of independent claim 18 is similar to that in claim 12 above, 
and includes the same statements regarding the limitation "indexing the representative 
document when the representative document is the newly crawled document; and repeating 
the receiving, reading, updating, determining and indexing operations with respect to a 
plurality of newly crawled documents, each of which shares a respective document identifier 
with a respective set of documents, such that at least some of the newly crawled documents 
are determined to be representative documents and are indexed." 

As above, Appellants focus on the limitation "repeating the receiving, reading, 
updating, determining and indexing operations with respect to a plurality of newly crawled 
documents, each of which shares a respective document identifier with a respective set of 
documents, such that at least some of the newly crawled documents are determined to be 
representative documents and are indexed," making no admission regarding the correctness 
or appropriateness of any other statement or issue raised by Examiner Morrison. Appellants 
will demonstrate that this limitation is not taught by Meyerzon, Cho, or Rujan. 

C. The § 103 Rejection of Claim 49. 

The Examiner cited a fourth reference against dependent claim 49. Specifically, the 
Examiner, in the Office Action mailed 08/20/2007 at page 15, stated that: 

Claim 49 is rejected under 35 U.S.C. 103(a) as being unpatentable over 
Meverzon et al. (' Meverzon ' hereinafter) (Patent Number 6,547,829) in view 
of Cho et al. (' Cho ' hereinafter) ("Finding replicated web collections," by Cho 
et al., Proceedings of the ACM SIGMOD International Conference on 
Management of Data, pages 355-366, 2000) and further in view of Lambert et 
al. (' Lambert ' hereinafter) (Patent Number 6,976,207 [sic, Patent Application 
Publication 2002/0038350]). 

The Examiner cited Lambert to address a specific limitation in dependent claim 49, 
which Appellants are not addressing at this time. As above, Appellants focus on the 
limitation "repeating the receiving, reading, updating, determining and indexing operations 
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with respect to a plurality of newly crawled documents, each of which shares a respective 
document identifier with a respective set of documents, such that at least some of the newly 
crawled documents are determined to be representative documents and are indexed," making 
no admission regarding the correctness or appropriateness of any other statement or issue 
raised by Examiner Morrison. Appellants will demonstrate that this limitation is not taught 
by Meyerzon, Cho, or Lambert. 

D. Summary of Rejections 

The limitation "repeating the receiving, reading, updating, determining and indexing 
operations with respect to a plurality of newly crawled documents, each of which shares a 
respective document identifier with a respective set of documents, such that at least some of 
the newly crawled documents are determined to be representative documents and are 
indexed" appears in each of the independent claims, and has been rejected on the same basis. 
In each case the Examiner asserted that this limitation is taught by Cho. 
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VII. Argument 

Appellants argue that the limitation "repeating the receiving, reading, updating, 
determining and indexing operations with respect to a plurality of newly crawled documents, 
each of which shares a respective document identifier with a respective set of documents, 
such that at least some of the newly crawled documents are determined to be representative 
documents and are indexed" is not taught by any of the asserted references Meyerzon, Cho, 
Rujan, or Lambert. 

A. To reject claims under 35 U.S.C. § 103, all claim limitations must be 
taught. 

Case law requires that to "establish prima facie obviousness of a claimed invention, 
all the claim limitations must be taught or suggested by the prior art." In re Royka, 490 F.2d 
981, 180 USPQ 580 (CCPA 1974) as cited at MPEP 2143.03. 

B. A "representative document" is the one indexed and thus presented to a 
user. 

Indexing is the operation that makes documents available to a search engine. 171 By 
indexing a representative document when there are duplicates, the system saves processing 
resources. 172 In addition, indexing a representative document from each set of duplicates 
provides a better user experience in response to a query: diverse results are not crowded out 
by duplicates. 173 Indexing a representative document from each set of duplicates is how the 
invention achieves its results. 

Indexing a representative document is recited in the claims. The claims require 
"determining a representative document for the newly crawled document and the identified 
set of documents" and "indexing the representative document when the representative 
document is the newly crawled document." This makes sense. The documents in the 
"identified set of documents" all have the same document content as the newly crawled 
document. If the newly crawled document becomes the representative, then it needs to be 

171 Application specification Iff! [0002], [0023]. 

172 Application specification If [0004]. 
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indexed; but if the representative document stays the same, the newly crawled document does 
not need to be indexed. 

The diagram below depicts this process graphically. The top portion shows newly 
crawled document F, and the set of documents A, B, C, D, and E, all sharing the same content 
as document F. Here, document C is shown as the original representative document for 
documents A, B, C, D, and E. Next, document F is added to the set, and a representative 
selected. The middle portion of the diagram shows the case where document C remains the 
representative. The bottom portion of the diagram shows the case where document F 
becomes the representative. In this case document F is indexed. 



Original Representative Document 




Newly Crawled Document Set of Documents 

Representative Document 



Case 1 : 

The Representative Document 
Remains the Same 




Newly Crawled Document and Set of 
Documents 



Representative Document 



Case 2: 

The Representative Document 
Becomes the Newly Crawled 
Document, and is Indexed 




Newly Crawled Document and Set of 
Documents 



Application specification Tf [0004]. 
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C. The claims require that the representative document changes for some 
documents. 

The claim language "such that at least some of the newly crawled documents are 
determined to be representative documents and are indexed" conveys the point that the 
representative document changes for some of the documents. Importantly, the claims address 
the case where the newly crawled documents are duplicates of documents already known, so 
selecting the newly crawled document as the representative changes the representative. 

For each newly crawled document, the reading operation identifies an original 
representative document with the same content 174 as the newly crawled document. The 
original representative document is not the newly crawled document because the original 
representative document was ascertained from a set of documents already stored in tables. 

Therefore, when newly crawled documents are "determined to be representative 
documents and are indexed," the representative documents have changed . 

D. Meyerzon does not teach a web crawling methodology where the 
representative document can change. 

Meyerzon addresses the detection of duplicate documents, but responds with a "first 
copy wins" approach. Meyerzon explains this in the specification at column 9, lines 33-40, 
with reference to figure 3 . When a document is crawled, the crawler determines if the 
content of the newly crawled document matches the content of a document already in the 
history table. 175 If the content already exists, then the address (URL) of the newly crawled 
document is just saved to the history table. If it is not found, then several steps are 
performed, including step S25, which indexes the new document. Because the first copy of a 
document is always the one that is indexed, there is no discussion of representative 
documents, or changing the representative document. 

In addition, the Examiner pointed out that Meyerzon does not teach the limitation 
addressed in this appeal. In the Office Action mailed 08/20/2007, the Examiner stated on 



The claim language literally says "sharing the document identifier." Paragraph [0004] in 
the specification explains: "In one embodiment, a document identifier is a fixed length 
fingerprint of a document's content. 

175 Meyerzon uses a "CID," which is defined as a "content identifier." See abstract; column 
2, lines 65-67. 
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page 4 that " Meverzon does not explicitly indicate "indexing the representative document 
when the representative document is the newly crawled document; and repeating the 
receiving, reading, updating, determining and indexing operations with respect to a plurality 
of newly crawled documents, each of which shares a respective document identifier with a 
respective set of documents, such that at least some of the newly crawled documents are 
determined to be representative documents and are indexed". 

E. Cho does not teach a web crawling methodology where the representative 
document can change. 

Cho teaches detection of duplicate documents, but the first copy found remains the 
permanent representative. Cho refers to this methodology as a "replica avoiding process." 176 
Specifically, "each crawl identifies new replicated collections that can be avoided in the 
future." 177 Like Meyerzon, the first copy discovered is the one that is indexed and used. This 
section of Cho does not teach changing and indexing a representative document. 

Cho also presents a revised way to display query results, but does not suggest 
determining and indexing a new representative. 178 Cho teaches that it is useful to continue 
gathering multiple copies of document collections because one of the copies may be 
unavailable later. 179 In response to a user query, Cho discloses a "presentation filter" that 
"rolls up" collections so that "it only displays the link of one page in a collection, even if 
multiple pages within the collection satisfy the query." 180 Thus, Cho keeps a record of 
duplicate documents which can be presented to a user, but only one is indexed and in the 
normal course only one is presented to the user. 

The Examiner refers to §§ 5.1, 5.2 of Cho, 181 which do not teach changing and 
indexing the representative for a set of duplicate documents. The Examiner first refers to 
"newly replicated collection, page 365, first column, second paragraph." This is § 5.1 of 
Cho, which discloses only a replica avoiding process. The first copy is indexed, and remains 
the representative permanently. The Examiner then refers to "one page displayed or 



Cho, last paragraph of § 5.1. 

177 Cho, last paragraph of § 5.1. 

178 Cho § 5.2, IT 1. 

179 Cho §5.2, If 1. 

180 Cho § 5.2, H 5. 

181 Office Action mailed 08/20/2007 at page 4. 
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represents collection of duplicate document, page 365, second column, first paragraph." This 
is § 5.2 of Cho, which teaches only a presentation filter, as described above. Thus, neither § 
5.1 nor § 5.2 teach having a set of duplicate documents where there is a representative 
document that changes and is indexed. 

F. Rujan does not teach a web crawling methodology where the 
representative document can change. 

The Examiner argued that Rujan teaches a limitation regarding "retiring the oldest 
one," which is not the limitation addressed in this appeal. 182 Rujan teaches a classification 
method, which is not relevant to the claim limitation addressed here. Most importantly, 
Rujan contains no discussion of duplicate documents. Without a discussion of duplicate 
documents, there is no discussion of representative documents or changing representative 
documents. 

G. Lambert does not teach a web crawling methodology where the 
representative document can change. 

The Examiner argued that Lambert teaches a limitation regarding "a document is a 
temporary redirect page . . .," which is not the limitation addressed in this appeal. 183 Lambert 
teaches a method of "enhanced web page delivery," which Lambert says "employs 
techniques in identifying visitors, both humans and search engine spiders, and appropriately 
redirecting them to specific Universal Resource Locators." The subject matter in Lambert is 
not relevant to the claim limitation addressed here. Most importantly, Lambert contains no 
discussion of duplicate documents. Without a discussion of duplicate documents, there is no 
discussion of representative documents or changing representative documents. 

H. Conclusion 

In summary, Appellants have demonstrated that the § 103 rejections cannot be 
sustained because the combination of references does not teach the claim limitation 
"repeating the receiving, reading, updating, determining and indexing operations with respect 
to a plurality of newly crawled documents, each of which shares a respective document 



Appellants do not admit that Rujan teaches the limitation as stated by the Examiner, but 
do not address that issue in this appeal. 

183 Appellants do not admit that Lambert teaches the limitation as stated by the Examiner, but 
do not address that issue in this appeal. 
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identifier with a respective set of documents, such that at least some of the newly crawled 
documents are determined to be representative documents and are indexed," which appears in 
all of the pending claims. 

In view of the foregoing, Appellants respectfully request the reversal of Examiner 
Morrison's rejections. Appellants further request allowance of the pending claims 12-20, 37- 
40, 42-58. If there are any other fees due in connection with the filing of this Brief, please 
charge the fees to Morgan, Lewis & Bockius LLP Deposit Account No. 50-0310 (order no. 
60963-0005-US). 

If a fee is required for an extension of time under 37 C.F.R. § 1.136 not accounted for 
above, such an extension is requested and the fee should be charged to Morgan, Lewis & 
Bockius LLP Deposit Account No. 50-0310 (order no. 60963-0005-US). 

Respectfully submitted, 

MORGAN, LEWIS & BOCKIUS LLP 



Dated: April 28, 2008 By: /GarvS. Williams/ 

Gary S. Williams 

Customer No.: 24341 Reg. No. 31,066 

MORGAN, LEWIS & BOCKIUS LLP 

3000 El Camino Real, Suite 700 
Palo Alto, CA 94306 
650-843-4000 
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VIII. Claims Appendix 

Claims Currently on Appeal Ordered By Number 
1-11. (Canceled). 

12. A method of detecting duplicate documents in a network crawling system, 
comprising: 

constructing a plurality of tables, each table corresponding to a portion of a document 
address space, storing information identifying documents having a same document identifier 
and each identified document having an associated document rank; 

receiving a newly crawled document, such document characterized by a document 
identifier and a document rank; 

reading information stored in the plurality of tables to identify a set of documents 
sharing the document identifier of the newly crawled document, and ascertaining an original 
representative document for the identified set of documents; 

updating the information stored in at least one of the tables in accordance with the 
document ranks of the identified set of documents and the newly crawled document; 

determining a representative document for the newly crawled document and the 
identified set of documents; 

indexing the representative document when the representative document is the newly 
crawled document; and 

repeating the receiving, reading, updating, determining and indexing operations with 
respect to a plurality of newly crawled documents, each of which shares a respective 
document identifier with a respective set of documents, such that at least some of the newly 
crawled documents are determined to be representative documents and are indexed. 

13. The method of claim 12, wherein information identifying the identified set of 
documents, including a particular document serving as the original representative document 
of the identified set, is stored in one or more tables. 

14. The method of claim 13, wherein the determining includes 

comparing the document rank of the newly crawled document with that of the 
particular document from the identified set in accordance with a set of predefined comparison 
criteria; 
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selecting the newly crawled document as the representative document if the set of 
predefined comparison criteria are met; and 

keeping the particular document as the representative document if the set of 
predefined comparison criteria is not met. 

15. The method of claim 14, wherein the set of predefined comparison criteria comprise 
at least two parameters, one parameter for comparison with an absolute difference of 
document ranks between the newly crawled document and the particular document, and 
another parameter for comparison with a ratio of document ranks between the newly crawled 
document and the particular document. 

16. The method of claim 12, wherein the updating includes inserting information 
identifying the newly crawled document into the at least one table only when a predefined 
insertion condition is satisfied. 

17. The method of claim 16, wherein the predefined insertion condition is that the 
document rank of the newly crawled document is higher than the document rank of at least 
one document in the identified set of documents. 

18. A method of detecting duplicate documents in a network crawling system, 
comprising: 

constructing a plurality of tables, each table corresponding to a segment of a 
document address space, storing information identifying documents having a same document 
identifier and each identified document having an associated document rank, wherein the 
plurality of tables comprise N+l tables where N is an integer greater than one, wherein the 
N+l tables comprise N tables, each generated during a respective phase of a set of N 
crawling phases, and a current table generated during a current one of the N crawling phases, 
wherein an oldest one of the N tables was generated during a previous instance of the current 
crawling phase; 

receiving a newly crawled document, such document characterized by a document 
identifier and a document rank; 

reading information stored in the N+l tables to identify a set of documents sharing the 
document identifier of the newly crawled document, and ascertaining an original 
representative document for the identified set of documents; 
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updating the information stored in the current table in accordance with the document 
rankings of the identified set of documents and the newly crawled document; 

determining a representative document for the newly crawled document and the 
identified set of documents; 

indexing the representative document when said representative document is the newly 
crawled document; 

repeating the receiving, reading, updating, determining and indexing operations with 
respect to a plurality of newly crawled documents, each of which shares a respective 
document identifier with a respective set of documents, such that at least some of the newly 
crawled documents are determined to be representative documents and are indexed; and 

upon completion of the current crawling phase, retiring the oldest one of the N tables. 

19. The method of claim 18, wherein the reading comprises reading from a merged table 
that stores information from a plurality of the N tables, and reading from the current table. 

20. The method of claim 18, wherein information identifying the identified set of 
documents, including a particular document serving as the original representative document 
of the identified set, is stored in one or more tables. 

21-36. (Canceled). 

37. A system for detecting duplicate documents during network crawling, comprising: 
one or more central processing units for executing programs; 
a network interface for receiving documents; and 

a duplicate document detection engine executable by the one or more central 
processing units, the engine comprising: 

a plurality of tables, each table corresponding to a segment of a document 
address space, storing information identifying documents having a same document identifier 
and each identified document having an associated document rank, wherein the plurality of 
tables comprise N+l tables where N is an integer greater than one, wherein the N+l tables 
comprise N tables, each generated during a respective phase of a set of N crawling phases, 
and a current table generated during a current one of the N crawling phases, wherein an 
oldest one of the N tables was generated during a previous instance of the current crawling 
phase; 
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instructions for receiving a newly crawled document, such document 
characterized by a document identifier and a document rank; 

instructions for reading information stored in the N+l tables to identify a set 
of documents, sharing the document identifier of the newly crawled document, and 
ascertaining an original representative document for the identified set of documents; 

instructions for updating the information stored in the current table in 
accordance with the document rankings of the identified set of documents and the newly 
crawled document; 

instructions for determining a representative document for the newly crawled 
document and the identified set of documents; 

instructions for indexing the representative document when said representative 
document is the newly crawled document; 

instructions for repeating the receiving, reading, updating, determining and 
indexing operations with respect to a plurality of newly crawled documents, each of which 
shares a respective document identifier with a respective set of documents, such that at least 
some of the newly crawled documents are determined to be representative documents and are 
indexed; and 

instructions for retiring the oldest one of the N tables upon completion of the current 
crawling phase. 

38. The system of claim 37 wherein the reading comprises reading from a merged table 
that stores information from a plurality of the N tables, and reading from the current table. 

39. The system of claim 37, wherein the identified set of documents, including a 
particular document serving as the original representative document of the identified set, are 
stored in one or more tables. 

40. A computer program product for use in conjunction with a computer system, the 
computer program product comprising a computer readable storage medium and a computer 
program mechanism embedded therein, the computer program mechanism comprising: 

instructions for constructing a plurality of data structures for storing information of 
documents, each document characterized by a document identifier and a document rank, the 
information stored in the plurality of data structures include the document identifier and a 
document rank for each document; 
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instructions for receiving a requesting document in association with its document 
identifier and document rank; 

instructions for selecting from the plurality of data structures a set of documents 
sharing the same document identifier as the requesting document, and ascertaining an original 
representative document for the identified set of documents; 

instructions for generating a new set of documents from the requesting document and 
the selected set of documents in accordance with their document rank; 

instructions for identifying a representative document of the new set of documents; 

instructions for indexing the representative document when said representative 
document is the newly crawled document; and 

instructions for repeating the receiving, reading, updating, determining and indexing 
operations with respect to a plurality of newly crawled documents, each of which shares a 
respective document identifier with a respective set of documents, such that at least some of 
the newly crawled documents are determined to be representative documents and are 
indexed. 

41. (Canceled). 

42. The computer program product of claim 40, wherein the plurality of data structures 
include a data structure for storing information of multiple sets of documents, each set of 
documents sharing a same document content. 

43. The computer program product of claim 40, wherein the plurality of data structures 
include a data structure for storing information of multiple sets of documents, each set of 
documents sharing a same document address. 

44. The computer program product of claim 40, wherein the document identifier is a fixed 
length fingerprint of document content of a document characterized by the document 
identifier. 

45. The computer program product of claim 40, wherein the document identifier is a fixed 
length fingerprint of an address of a document characterized by the document identifier. 

46. The computer program product of claim 40, wherein the generating instructions 
include 
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sorting the requesting document and the selected set of documents in accordance with 
a metric included in score information of the requesting document and selected set of 
documents; and 

selecting a new set of documents, having at most a predefined number of documents, 
from the requesting document and the selected set of documents based on the sorting result. 

47. The computer program product of claim 40, wherein 

the score information for each document includes a document rank; and 
the identifying instructions include 

comparing the document rank of the requesting document with that of a particular 
document from the selected set of documents in accordance with a set of predefined 
comparison criteria, wherein the particular document was previously determined to be the 
representative document for the selected set of documents; 

selecting the requesting document as the representative document for the new set of 
documents if the set of predefined comparison criteria are met; and 

keeping the particular document as the representative document for the new set of 
documents if the set of predefined comparison criteria is not met. 

48. The computer program product of claim 47, wherein the set of predefined comparison 
criteria comprise at least two parameters, one parameter for comparison with an absolute 
difference of document rank between the requesting document and the particular document, 
and another parameter for comparison with a ratio of document rank between the requesting 
document and the particular document. 

49. The computer program product of claim 40, wherein a document is a temporary 
redirect page comprising a document content, a source document address, and a target 
document address. 

50. A computer program product for use in conjunction with a computer system, the 
computer program product comprising a computer readable storage medium and a computer 
program mechanism embedded therein, the computer program mechanism comprising: 

instructions for constructing a plurality of tables, each table corresponding to a 
portion of a document address space, storing information identifying documents having a 
same document identifier and each identified document having an associated document rank; 
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instructions for receiving a newly crawled document, such document characterized by 
a document identifier and a document rank; 

instructions for reading information stored in the plurality of tables to identify a set of 
documents sharing the document identifier of the newly crawled document, and ascertaining 
an original representative document for the identified set of documents; 

instructions for updating the information stored in at least one of the tables in 
accordance with the document ranks of the identified set of documents and the newly crawled 
document; 

instructions for determining a representative document for the newly crawled 
document and the identified set of documents; 

instructions for indexing the representative document when said representative 
document is the newly crawled document; and 

instructions for repeating the receiving, reading, updating, determining and indexing 
operations with respect to a plurality of newly crawled documents, each of which shares a 
respective document identifier with a respective set of documents, such that at least some of 
the newly crawled documents are determined to be representative documents and are 
indexed. 

5 1 . The computer program product of claim 50, wherein information identifying the 
identified set of documents, including a particular document serving as the original 
representative document of the identified set, is stored in one or more tables. 

52. The computer program product of claim 5 1 , wherein the determining includes 
comparing the document rank of the newly crawled document with that of the 

particular document from the identified set in accordance with a set of predefined comparison 
criteria; 

selecting the newly crawled document as the representative document if the set of 
predefined comparison criteria are met; and 

keeping the particular document as the representative document if the set of 
predefined comparison criteria is not met. 

53 . The computer program product of claim 5 1 , wherein the set of predefined comparison 
criteria comprise at least two parameters, one parameter for comparison with an absolute 
difference of document ranks between the newly crawled document and the particular 



l-PA/3695090 



ATTORNEY DOCKET NO.: 060963-0005-US 
Application No.: 10/614,111 
Page 40 

document, and another parameter for comparison with a ratio of document ranks between the 
newly crawled document and the particular document. 

54. The computer program product of claim 50, wherein the updating includes inserting 
information identifying the newly crawled document into the at least one table only when a 
predefined insertion condition is satisfied. 

55. The computer program product of claim 50, wherein the predefined insertion 
condition is that the document rank of the newly crawled document is higher than the 
document rank of at least one document in the identified set of documents. 

56. A computer program product of detecting duplicate documents for use in conjunction 
with a computer system, the computer program product comprising a computer readable 
storage medium and a computer program mechanism embedded therein, the computer 
program mechanism comprising: 

instructions for constructing a plurality of tables, each table corresponding to a 
segment of a document address space, storing information identifying documents having a 
same document identifier and each identified document having an associated document rank, 
wherein the plurality of tables comprise N+ 1 tables where N is an integer greater than one, 
wherein the N+l tables comprise N tables, each generated during a respective phase of a set 
of N crawling phases, and a current table generated during a current one of the N crawling 
phases, wherein an oldest one of the N tables was generated during a previous instance of the 
current crawling phase; 

instructions for receiving a newly crawled document, such document characterized by 
a document identifier and a document rank; 

instructions for reading information stored in the N+l tables to identify a set of 
documents sharing the document identifier of the newly crawled document, and ascertaining 
an original representative document for the identified set of documents; 

instructions for updating the information stored in the current table in accordance with 
the document rankings of the identified set of documents and the newly crawled document; 

instructions for determining a representative document for the newly crawled 
document and the identified set of documents; 

instructions for indexing the representative document when said representative 
document is the newly crawled document; 
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instructions for repeating the receiving, reading, updating, determining and indexing 
operations with respect to a plurality of newly crawled documents, each of which shares a 
respective document identifier with a respective set of documents, such that at least some of 
the newly crawled documents are determined to be representative documents and are 
indexed; and 

instructions for retiring the oldest one of the N tables upon completion of the current 
crawling phase. 

57. The computer program product of claim 56, wherein the reading comprises reading 
from a merged table that stores information from a plurality of the N tables, and reading from 
the current table. 

58. The computer program product of claim 56, wherein the identified set of documents, 
including a particular document serving as the original representative document of the 
identified set, is stored in one or more tables. 
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IX. Evidence Appendix 



For this appeal, Appellants do not rely on any evidence submitted pursuant to §§ 
1.130, 1 . 1 3 1 , or 1 . 1 32, or other evidence entered by the Examiner. 
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X. Related Proceedings Appendix 



Appellants are aware of no related proceedings. 
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XL Proposed Amendment to Claim 40 



40. (Currently amended) A computer program product for use in conjunction with a 
computer system, the computer program product comprising a computer readable storage 
medium and a computer program mechanism embedded therein, the computer program 
mechanism comprising: 

instructions for constructing a plurality of data structures for storing information of 
documents, each document characterized by a document identifier and a document rank, the 
information stored in the plurality of data structures include the document identifier and a 
document rank for each document; 

instructions for receiving a requesting document in association with its document 
identifier and document rank; 

instructions for selecting from the plurality of data structures a set of documents 
sharing the same document identifier as the requesting document, and ascertaining an original 
representative document for the identified set of documents; 

instructions for generating a new set of documents from the requesting document and 
the selected set of documents in accordance with their document rank; 

instructions for identifying a representative document of the new set of documents; 

instructions for indexing the representative document when said representative 
document is the nowly crawled requesting document; and 

instructions for repeating the receiving, reading, updating, determining selecting, 
generating, identifying, and indexing operations with respect to a plurality of newly crawled 
requesting documents, each of which shares a respective document identifier with a 
respective set of documents, such that at least some of the nowly crawled requesting 
documents are determined to be representative documents and are indexed. 
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